census_api_key("YOUR API KEY GOES HERE", install=TRUE)Let’s show quickly how the data we used last week from the Assessor can be converted to an sf object.
Starting with the 2020 census…let’s look at institutional populations by county.
We want to create an sf object which has the counts/values we are interested in showing joined in.
Note that making maps, similar to regular plots, requires thoughtful selection of the area you want to show based on how much information is consumable.
Just the Midwest…
Finding variables
Getting our data together
First leaflet map
Making pallete
Add labels/legend
Divvy by community area
## Reading layer `OGRGeoJSON' from data source
## `https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON'
## using driver `GeoJSON'
## Simple feature collection with 77 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## Geodetic CRS: WGS 84
Example from the textbook.
## Reading layer `OGRGeoJSON' from data source
## `https://data.cityofchicago.org/api/geospatial/fthy-xz3r?method=export&format=GeoJSON'
## using driver `GeoJSON'
## Simple feature collection with 25 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64455 xmax: -87.52414 ymax: 42.02303
## Geodetic CRS: WGS 84
Let’s plot the kernel density of these distances.
One might wonder how this distribution of crimes’ distances to the nearest station compares to the overall distribution of distances to the nearest station for all of Chicago. If police station locations are chosen to be in high-crime areas (or if they attract crime), then we would expect the distances between crimes and stations to be concentrated on smaller distances relative to the distances to stations for all of Chicago. If crimes avoid police stations, then we would expect their distribution to be pushed farther out relative to greater Chicago.
Constructing a benchmark. If we believe police station location has some relationship with location of crime, we need to construct a benchmark – some comparison group that gives context. One strategy involves constructing a point vector containing a random or regular set of points placed throughout Chicago. This benchmark makes the assumption that if distance did not matter, then there is an equal chance of crime at every location in Chicago. When we compare the equal chance distribution to the actual crime-distance distribution, we can infer if distance to police station has any relationship with crime.
To construct this equal chance distribution, we first create a single polygon for Chicago by merging the police district polygons (district_sf) through the st_union function. In effect, the borders between all polygons are removed and merged into a single large city limits boundary. From this new city polygon, we draw an hexagonal grid of points (\(n=10000\) to be exact) using st_sample. Then, the same distance calculations are applied to find the minimum distance between each of the \(n=10000\) points to police stations.
We compare the police districts, the grid points and the minimum distance to police station. To an extent, the minimum distance to police station loosely follows the boundaries of the police districts – there are both areas that are well-covered and others that are far less so. We can see that most of Chicago is within 5 km of a police station – with the exception of the airport in the northwestern corner.
Comparing distance distributions. To answer the original question, we plot kernel densities of the distance distributions as seen. The vast majority of crimes (red line) in this data set occur within 2.5km of a police station, while our sampling of points from all of Chicago has a lower density in this distance range. This means that crimes tend to occur closer to police stations. Is this causal? It is hard to draw a firm conclusion.
There are many factors that could contribute to this trend. Perhaps police stations are placed in high-crime neighborhoods or perhaps police place more effort on areas near the station, etc. Drawing a causal inference is quite challenging without an experiment design.
Nonetheless, it may be informative to focus on a few crime types and other attributes of the data. Perhaps the likelihood of generating an arrest differs based on distance from a police station offers a clue. We compare the distance distribution between narcotics incidents that led to an arrest versus those that do not. Interestingly, we see some evidence that the chance of an arrest could depend upon the distance between the incident and the police station. That said, there are many other relationships which we would want explore more deeply before making any decisions.